ARTIFICIAL NEURAL NETWORKS
0.
Preparation: review of principal
components and PC regression
> attach(USArrests)
> names(USArrests)
[1]
"Murder"
"Assault" "UrbanPop" "Rape"
> states
= row.names # The first column in dataset “USArrests” did not have any name
> pc
= prcomp(USArrests, scale=TRUE)
> biplot(pc)
Red vectors are projections of the original X-variables
on the space of the first two principal components. We can see that the first
principal component Z1 mostly represents the combined crime rate,
and the second principal component Z2 mostly represents the level of
urbanization.
1.
Artificial neural network with no
hidden layers is just linear regression
Artificial neural networks are available in
package neuralnet.
> library(neuralnet)
Create the training and testing data.
> attach(Auto)
> n = length(mpg)
> Z = sample(n,200)
> Auto.train
= Auto[Z,]
> Auto.test
= Auto[-Z,]
> nn0 = neuralnet( mpg ~ weight+acceleration+horsepower+cylinders, data=Auto.train, hidden=0 )
> nn0
Error Reached Threshold Steps
1 1720.963134
0.009532195866
8565
> plot(nn0)
This is the linear regression! Weights are
slopes, and the intercept is shown in blue.
> lm(
mpg ~ weight+acceleration+horsepower+cylinders, data=Auto.train )
Coefficients:
(Intercept) weight acceleration horsepower cylinders
44.446611467 -0.005475864 0.056149012
-0.015172364 -0.744442628
2.
Artificial neural network
Now, introduce 3 hidden nodes.
> nn3 = neuralnet(
mpg ~ weight+acceleration+horsepower+cylinders, data=Auto.train, hidden=3 )
> plot(nn3)
3.
Prediction power
Which ANN gives a more accurate prediction? Use
the test data for comparison.
> Predict0 = compute(
nn0, subset(Auto.test,
select=c(weight, acceleration, horsepower, cylinders) ) )
This prediction consists of X-variables
“neurons” and predicted Y-variable “net.result”.
> names(Predict0)
[1] "neurons"
"net.result"
> mean(
(Auto.test$mpg - Predict0$net.result)^2 )
[1] 18.83350028
Prediction MSE of this ANN is 18.83.
> Predict3 = compute( nn3, subset(Auto.test,
select=c(weight,acceleration,horsepower,cylinders) )
)
> mean(
(Auto.test$mpg - Predict3$net.result)^2 )
[1] 61.84868054
Its 3-node competitor has a much higher
prediction MSE.
4.
Multilayer structure
The number of hidden nodes can be given as a
vector. Its components show the number of hidden nodes at each hidden layer.
> nn3.2 = neuralnet(
mpg ~ weight+acceleration+horsepower+cylinders, data=Auto.train, hidden=c(3,2) )
> plot(nn3.2)
5.
Artificial neural network for
classification
> library(nnet)
Prepare our categorical variables ECO and ECO4
> ECO = ifelse( mpg > 22.75, "Economy", "Consuming"
)
> ECO4 = rep("Economy",n)
> ECO4[mpg < 29] = "Good"
> ECO4[mpg < 22.75] =
"OK"
> ECO4[mpg < 17] =
"Consuming"
> Auto.train = Auto[Z,]
> Auto.test = Auto[-Z,]
Train an artificial neural network to classify
cars into “Economy” and “Consuming”.
> nn.class = nnet( as.factor(ECO) ~ weight +
acceleration + horsepower + cylinders, data=Auto.train,
size=3 )
# weights: 19
initial
value 169.471585
final
value 138.379332
converged
> summary(nn.class)
a 4-3-1 network with 19 weights
options were - entropy fitting
b->h1
i1->h1 i2->h1 i3->h1 i4->h1
-0.11 -0.35
0.12 0.32 0.16
b->h2
i1->h2 i2->h2 i3->h2 i4->h2
0.59 -0.37
-0.56 0.67 0.44
b->h3
i1->h3 i2->h3 i3->h3 i4->h3
0.50 0.16
0.41 -0.49 0.02
b->o
h1->o h2->o h3->o
-0.12 -0.27
0.03 0.02
This ANN has p=4 inputs, one layer of M=3 hidden
nodes, and a single (K=1) output. We need to estimate M(p+1)+K(M+1)
= (3)(5)+(1)(4) = 19 weights.
Classification into K > 2 categories is
similar.
> nn.class = nnet( as.factor(ECO4) ~ weight+acceleration+horsepower+cylinders, data=Auto.train, size=3 )
# weights: 31
initial value 333.799856
final value 276.956630
converged
> summary(nn.class)
a 4-3-4 network with 31 weights
options were - softmax
modelling
b->h1 i1->h1 i2->h1 i3->h1 i4->h1
0.39 0.69 -0.58
0.23 0.11
b->h2 i1->h2 i2->h2 i3->h2 i4->h2
-0.49 -0.67 -0.65
-0.68 0.11
b->h3 i1->h3 i2->h3 i3->h3 i4->h3
-0.67
0.24 0.59 0.43
0.00
b->o1 h1->o1 h2->o1 h3->o1
0.98 -0.25 0.32
-0.12
b->o2 h1->o2 h2->o2 h3->o2
0.13 0.41 0.19 -0.02
b->o3 h1->o3 h2->o3 h3->o3
0.20 0.28 0.34 -0.01
b->o4 h1->o4 h2->o4 h3->o4
-0.24 0.42 -0.09 0.41
Here K = 4 categories, so we are estimating (3)(5) + (4)(4) = 31 weights.